Goto

Collaborating Authors

 North Slope Borough




Recent Advances in Simulation-based Inference for Gravitational Wave Data Analysis

Liang, Bo, Wang, He

arXiv.org Machine Learning

The detection of gravitational waves by the LIGO-Virgo-KAGRA collaboration has ushered in a new era of observational astronomy, emphasizing the need for rapid and detailed parameter estimation and population-level analyses. Traditional Bayesian inference methods, particularly Markov chain Monte Carlo, face significant computational challenges when dealing with the high-dimensional parameter spaces and complex noise characteristics inherent in gravitational wave data. This review examines the emerging role of simulation-based inference methods in gravitational wave astronomy, with a focus on approaches that leverage machine-learning techniques such as normalizing flows and neural posterior estimation. We provide a comprehensive overview of the theoretical foundations underlying various simulation-based inference methods, including neural posterior estimation, neural ratio estimation, neural likelihood estimation, flow matching, and consistency models. We explore the applications of these methods across diverse gravitational wave data processing scenarios, from single-source parameter estimation and overlapping signal analysis to testing general relativity and conducting population studies. Although these techniques demonstrate speed improvements over traditional methods in controlled studies, their model-dependent nature and sensitivity to prior assumptions are barriers to their widespread adoption. Their accuracy, which is similar to that of conventional methods, requires further validation across broader parameter spaces and noise conditions.


Evaluating the Robustness of Dense Retrievers in Interdisciplinary Domains

Chaturvedi, Sarthak, Acharya, Anurag, Meyur, Rounak, Hayashi, Koby, Munikoti, Sai, Horawalavithana, Sameera

arXiv.org Artificial Intelligence

Evaluation benchmark characteristics may distort the true benefits of domain adaptation in retrieval models. This creates misleading assessments that influence deployment decisions in specialized domains. We show that two benchmarks with drastically different features such as topic diversity, boundary overlap, and semantic complexity can influence the perceived benefits of fine-tuning. Using environmental regulatory document retrieval as a case study, we fine-tune ColBERTv2 model on Environmental Impact Statements (EIS) from federal agencies. We evaluate these models across two benchmarks with different semantic structures. Our findings reveal that identical domain adaptation approaches show very different perceived benefits depending on evaluation methodology. On one benchmark, with clearly separated topic boundaries, domain adaptation shows small improvements (maximum 0.61% NDCG gain). However, on the other benchmark with overlapping semantic structures, the same models demonstrate large improvements (up to 2.22% NDCG gain), a 3.6-fold difference in the performance benefit. We compare these benchmarks through topic diversity metrics, finding that the higher-performing benchmark shows 11% higher average cosine distances between contexts and 23% lower silhouette scores, directly contributing to the observed performance difference. These results demonstrate that benchmark selection strongly determines assessments of retrieval system effectiveness in specialized domains. Evaluation frameworks with well-separated topics regularly underestimate domain adaptation benefits, while those with overlapping semantic boundaries reveal improvements that better reflect real-world regulatory document complexity. Our findings have important implications for developing and deploying AI systems for interdisciplinary domains that integrate multiple topics.


Mapping bathymetry of inland water bodies on the North Slope of Alaska with Landsat using Random Forest

Carroll, Mark L., Wooten, Margaret R., Simpson, Claire E., Spradlin, Caleb S., Frost, Melanie J., Blanco-Rojas, Mariana, Williams, Zachary W., Caraballo-Vega, Jordan A., Neigh, Christopher S. R.

arXiv.org Artificial Intelligence

The North Slope of Alaska is dominated by small waterbodies that provide critical ecosystem services for local population and wildlife. Detailed information on the depth of the waterbodies is scarce due to the challenges with collecting such information. In this work we have trained a machine learning (Random Forest Regressor) model to predict depth from multispectral Landsat data in waterbodies across the North Slope of Alaska. The greatest challenge is the scarcity of in situ data, which is expensive and difficult to obtain, to train the model. We overcame this challenge by using modeled depth predictions from a prior study as synthetic training data to provide a more diverse training data pool for the Random Forest. The final Random Forest model was more robust than models trained directly on the in situ data and when applied to 208 Landsat 8 scenes from 2016 to 2018 yielded a map with an overall $r^{2}$ value of 0.76 on validation. The final map has been made available through the Oak Ridge National Laboratory Distribute Active Archive Center (ORNL-DAAC). This map represents a first of its kind regional assessment of waterbody depth with per pixel estimates of depth for the entire North Slope of Alaska.


RAG vs. Long Context: Examining Frontier Large Language Models for Environmental Review Document Comprehension

Phan, Hung, Acharya, Anurag, Chaturvedi, Sarthak, Sharma, Shivam, Parker, Mike, Nally, Dan, Jannesari, Ali, Pazdernik, Karl, Halappanavar, Mahantesh, Munikoti, Sai, Horawalavithana, Sameera

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have been applied to many research problems across various domains. One of the applications of LLMs is providing question-answering systems that cater to users from different fields. The effectiveness of LLM-based question-answering systems has already been established at an acceptable level for users posing questions in popular and public domains such as trivia and literature. However, it has not often been established in niche domains that traditionally require specialized expertise. To this end, we construct the NEPAQuAD1.0 benchmark to evaluate the performance of three frontier LLMs -- Claude Sonnet, Gemini, and GPT-4 -- when answering questions originating from Environmental Impact Statements prepared by U.S. federal government agencies in accordance with the National Environmental Environmental Act (NEPA). We specifically measure the ability of LLMs to understand the nuances of legal, technical, and compliance-related information present in NEPA documents in different contextual scenarios. For example, we test the LLMs' internal prior NEPA knowledge by providing questions without any context, as well as assess how LLMs synthesize the contextual information present in long NEPA documents to facilitate the question/answering task. We compare the performance of the long context LLMs and RAG powered models in handling different types of questions (e.g., problem-solving, divergent). Our results suggest that RAG powered models significantly outperform the long context models in the answer accuracy regardless of the choice of the frontier LLM. Our further analysis reveals that many models perform better answering closed questions than divergent and problem-solving questions.


AURORA: Navigating UI Tarpits via Automated Neural Screen Understanding

Khan, Safwat Ali, Wang, Wenyu, Ren, Yiran, Zhu, Bin, Shi, Jiangfan, McGowan, Alyssa, Lam, Wing, Moran, Kevin

arXiv.org Artificial Intelligence

Nearly a decade of research in software engineering has focused on automating mobile app testing to help engineers in overcoming the unique challenges associated with the software platform. Much of this work has come in the form of Automated Input Generation tools (AIG tools) that dynamically explore app screens. However, such tools have repeatedly been demonstrated to achieve lower-than-expected code coverage - particularly on sophisticated proprietary apps. Prior work has illustrated that a primary cause of these coverage deficiencies is related to so-called tarpits, or complex screens that are difficult to navigate. In this paper, we take a critical step toward enabling AIG tools to effectively navigate tarpits during app exploration through a new form of automated semantic screen understanding. We introduce AURORA, a technique that learns from the visual and textual patterns that exist in mobile app UIs to automatically detect common screen designs and navigate them accordingly. The key idea of AURORA is that there are a finite number of mobile app screen designs, albeit with subtle variations, such that the general patterns of different categories of UI designs can be learned. As such, AURORA employs a multi-modal, neural screen classifier that is able to recognize the most common types of UI screen designs. After recognizing a given screen, it then applies a set of flexible and generalizable heuristics to properly navigate the screen. We evaluated AURORA both on a set of 12 apps with known tarpits from prior work, and on a new set of five of the most popular apps from the Google Play store. Our results indicate that AURORA is able to effectively navigate tarpit screens, outperforming prior approaches that avoid tarpits by 19.6% in terms of method coverage. The improvements can be attributed to AURORA's UI design classification and heuristic navigation techniques.


HKD-SHO: A hybrid smart home system based on knowledge-based and data-driven services

Qiu, Mingming, Najm, Elie, Sharrock, Rémi, Traverson, Bruno

arXiv.org Artificial Intelligence

A smart home is realized by setting up various services. Several methods have been proposed to create smart home services, which can be divided into knowledge-based and data-driven approaches. However, knowledge-based approaches usually require manual input from the inhabitant, which can be complicated if the physical phenomena of the concerned environment states are complex, and the inhabitant does not know how to adjust related actuators to achieve the target values of the states monitored by services. Moreover, machine learning-based data-driven approaches that we are interested in are like black boxes and cannot show the inhabitant in which situations certain services proposed certain actuators' states. To solve these problems, we propose a hybrid system called HKD-SHO (Hybrid Knowledge-based and Data-driven services based Smart HOme system), where knowledge-based and machine learning-based data-driven services are profitably integrated. The principal advantage is that it inherits the explicability of knowledge-based services and the dynamism of data-driven services. We compare HKD-SHO with several systems for creating dynamic smart home services, and the results show the better performance of HKD-SHO.


What is Hiding in Medicine's Dark Matter? Learning with Missing Data in Medical Practices

Suzen, Neslihan, Mirkes, Evgeny M., Roland, Damian, Levesley, Jeremy, Gorban, Alexander N., Coats, Tim J.

arXiv.org Artificial Intelligence

Electronic patient records (EPRs) produce a wealth of data but contain significant missing information. Understanding and handling this missing data is an important part of clinical data analysis and if left unaddressed could result in bias in analysis and distortion in critical conclusions. Missing data may be linked to health care professional practice patterns and imputation of missing data can increase the validity of clinical decisions. This study focuses on statistical approaches for understanding and interpreting the missing data and machine learning based clinical data imputation using a single centre's paediatric emergency data and the data from UK's largest clinical audit for traumatic injury database (TARN). In the study of 56,961 data points related to initial vital signs and observations taken on children presenting to an Emergency Department, we have shown that missing data are likely to be non-random and how these are linked to health care professional practice patterns. We have then examined 79 TARN fields with missing values for 5,791 trauma cases. Singular Value Decomposition (SVD) and k-Nearest Neighbour (kNN) based missing data imputation methods are used and imputation results against the original dataset are compared and statistically tested. We have concluded that the 1NN imputer is the best imputation which indicates a usual pattern of clinical decision making: find the most similar patients and take their attributes as imputation.


Autonomous Catheterization with Open-source Simulator and Expert Trajectory

Jianu, Tudor, Huang, Baoru, Vo, Tuan, Vu, Minh Nhat, Kang, Jingxuan, Nguyen, Hoan, Omisore, Olatunji, Berthet-Rayne, Pierre, Fichera, Sebastiano, Nguyen, Anh

arXiv.org Artificial Intelligence

Endovascular robots have been actively developed in both academia and industry. However, progress toward autonomous catheterization is often hampered by the widespread use of closed-source simulators and physical phantoms. Additionally, the acquisition of large-scale datasets for training machine learning algorithms with endovascular robots is usually infeasible due to expensive medical procedures. In this chapter, we introduce CathSim, the first open-source simulator for endovascular intervention to address these limitations. CathSim emphasizes real-time performance to enable rapid development and testing of learning algorithms. We validate CathSim against the real robot and show that our simulator can successfully mimic the behavior of the real robot. Based on CathSim, we develop a multimodal expert navigation network and demonstrate its effectiveness in downstream endovascular navigation tasks. The intensive experimental results suggest that CathSim has the potential to significantly accelerate research in the autonomous catheterization field. Our project is publicly available at https://github.com/airvlab/cathsim. Endovascular interventions are commonly performed for the diagnosis and treatment of vascular diseases. This intervention involves the utilization of flexible tools, namely guidewires, and catheters. These instruments are introduced into the body via small incisions and manually navigated to specific body regions through the vascular system [69]. Endovascular tool navigation takes approximately 70% of the intervention time and is utilized for a plethora of vascular-related conditions such as peripheral artery disease, aneurysms, and stenosis [49].